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RARE EVENT SIMULATION AND SPLITTING FOR DISCONTINUOUS RANDOM 

VARIABLES * 

Clement Walter^’^ 


Abstract. Multilevel Splitting methods, also called Sequential Monte-Carlo or Subset Simulation, are 
widely used methods for estimating extreme probabilities of the form P [5'(U) > q] where S' is a deterministic 
real-valued function and U can be a random finite- or infinite-dimensional vector. Very often, X := S(U) 
is supposed to be a continuous random variable and a lot of theoretical results on the statistical behaviour 
of the estimator are now derived with this hypothesis. However, as soon as some threshold effect appears 
in S and/or U is discrete or mixed discrete/continuous this assumption does not hold any more and the 
estimator is not consistent. 

In this paper, we study the impact of discontinuities in the cdf of X and present three unbiased corrected 
estimators to handle them. These estimators do not require to know in advance if X is actually discon¬ 
tinuous or not and become all equal if X is continuous. Especially, one of them has the same statistical 
properties in any case. Efficiency is shown on a 2-D diffusive process as well as on the Boolean SATisfiability 
problem (SAT). 

AMS Subject Classification. 65C05, 65C60, 62L12, 62N02. 


Introduction 

In the context of reliability analysis, one is often concerned with the estimation of extreme quantile or probability. 
Indeed one goal of uncertainty quantification is to estimate the probability of failure of a given system and inversely, 
quantile estimation helps defining guidelines to insure a good behaviour of the system with a given probability of 
failure. Usually, the system is considered as a blackbox (often a complex numerical code) which returns a real value 
defining its state. According to this output, it is then considered as working properly or not. 

Formally, the problem can be written as follows: let U be a random finite- or infinite-dimensional vector with 
known distribution gF and S a performance function (the computer code for instance), one seeks for estimating 
p given q (or q given p) such that p = P [>S'(U) > q\. The main difficulties arise from the fact that 1) the sought 
probability or quantile is extreme, say p < 10“® and 2) the computer code is very time consuming. All together, 
these two characteristics prevent from using a naive Monte-Carlo estimator: 

1 ^ 

= ( 1 ) 

with (Ui)i an iid sample with distribution pF because CV [pmc]^ ~ (Vp)“^, which means that one would require 
N = 10^/p to get a coefficient of variation of 10%. 

In this scope other statistics have been defined to get low variance estimators. Among them the Multilevel 
Splitting method (MS) [15, 16, 20, 23] rewrites the sought probability using the Bayes’ rule and a finite sequence 
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of increasing thresholds (9i)™Q such that qo = —oo and qm = q- 

P [S{U) > g] = P [5(U) > q^ I 5(U) > g™_i] x • • • x P [8(1]) > q^ \ 8(1]) > gi] P [8{U) > gi] (2) 

From Eq. (2) the goal is then to estimate separately each conditional probability with a crude Monte Carlo. Hence, 
the sequence {qj)j has to be chosen such that each probability is not too small to make its estimation with crude 
Monte-Carlo feasible [2, 25]. The variance of the estimator depends on the choice of this sequence and especially it 
is known that the conditional probabilities should be all equal to minimize it [8]. A typical MS algorithm works as 
follows: 

(1) Sample a Monte-Carlo population (Ui)i of size N; j = 0 

(2) Estimate the conditional probability P [5'(U) > g^+i | 5'(U) > g^j 

(3) Resample the such that S'(Ui) < g^+i conditionally to be greater than g^+i (the other ones do not 
change) 

(4) jA— j -|- 1 and repeat until j = m 

When it is not possible to infer such a sequence {qj)j, it is usual to define it on-the-fly while the algorithm 
is running and this is known as Adaptive Multilevel Splitting (AMS). The sequence is built either by fixing the 
conditional probabilities to be all equal to some given value po € (0.1, 0.5) [2], or by using the order statistics 
and so to estimate the corresponding probability by 1 — k/N. In the first case, (qj)j is then a sequence of estimated 
quantiles with crude Monte Carlo while in the second case it is a sequence of arbitrarily chosen thresholds and 
the corresponding probabilities are estimated with crude Monte Carlo. Hence the first option leads to a bias in 
the final estimator [8, 10] while the latter produces an unbiased estimator for any k ]7]. The link with Interacting 
Particle Systems ]13] allowed for deriving a lot of theoretical results on the optimal number of subsets, the statistical 
behaviour of the estimator and the impact of the adaptive method ]3, 9]. Furthermore, the special case of the Last 
Particle Algorithm (AMS with fc = 1) has gained a lot of attention recently. It has the smallest variance amongst 
all AMS ]7]; especially Simonnet ]28] and Guyader et al. ]17] showed that the random number of iterations of the 
algorithm follows a Poisson law when X is continuous. Moreover Walter ]30], following Huber and Schott ]19], 
brought an original insight in terms of a random walk of the real-valued random variable X = >S'(U) linked with a 
Poisson Process with parameter 1. Indeed the estimator is found to be the Minimal Variance Unbiased Estimator 
(MVUE) of the exponential of a parameter of a Poisson law with N iid. realisations of such random variables. As 
a consequence it turns it into the optimal (minimal variance) parallel Multilevel Splitting estimator ]31, 32]. 

However, all these results assume that the edf of X is continuous and little is known about the impact of using 
such strategies if this assumption does not hold. This case can happen if for example there is some threshold effect 
in 8 and/or if U is discrete or mixed discrete/continuous ]11, 24, 25, 26]. Recently, Simonnet ]28] showed that 
in the case of the Last Particle Algorithm the random number of iterations is indeed a mixture of independent 
Poisson and negative binomial laws while in the continuous case, it is only a Poisson law. Unfortunately he could 
not derive a general unbiased estimator from this result ]28, Theorems 4 and 5]. The main problem comes from 
the fact that the equality Va; € M, P ]X > x] = P ]A > x] does not hold any more. Rubinstein ]24] already noticed 
that one should pay attention to the fact that the root g^ of P [S')!!) > g^ | S')!!) > qi-i] = po may not be unique 
]24, Remark 6.1], ]5, Remark 2.6]. He then derived some guidelines for an appropriate adaptive choice of the {qi)i 
in this case but concludes that the algorithm can eventually fail to estimate the sought probability (it stops at 
an intermediate level ]see 24, Remark 6.3] or returns 0 ]!]). Finally, Cerou et al. ]11] suggested to use for the 
SAT problem an auxiliary continuous random variable Y such that P [V > g] = P [S(U) > g] and showed practical 
improvement. This strategy is also used by Huber and Schott ]19] for the Ising model but there are always case 
specific transformations. Skilling ]29] also mentions this issue and proposes to add a uniform random variable on a 
tiny interval but one lacks of justifications and clear guidelines and consequences. 

Following the random walk framework from ]30] the goal of this paper is to fill this gap by providing both the 
distribution of the number of iterations and the MVUE estimator in each case: the one with strict inequality and 
the one with non-strict inequality. Indeed the decomposition of Eq. (2) suggests these two possible definitions for 
a Multilevel Splitting method. While there are the same with probability 1 if A is continuous, they may differ if 
X is not. Especially recent results from ]12] derive an unbiased estimator for the strict inequality case in the usual 
AMS framework. However the distributions of the estimators are less simple than in the continuous case. In this 
scope we also suggest a third estimator based on the random walk with non-strict inequality which has the same 
statistical properties as in the continuous case. Practically speaking it is not necessary to know in advance if X is 
actually continuous or not and in this latter case, the three estimators become the same. 

The paper is organized as follows: first we recall the theoretical results for continuous random variables. Then 
we show how discontinuities alter them and we give corresponding corrected estimators. Finally, we apply this 
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modified version to a dynamic test case from [28], it is the probability of a Markov process reaching a given set B 
before going to another set A, and to a counting problem, namely the Boolean SATisfiability problem [18]. 

1. Rare event simulation for discontinuous random variables 

1.1. The increasing random walk 

In this section, we recall common results from [17, 19, 28] in the framework of [30]. Let us consider X = S{XJ) G K 
a real-valued random variable with distribution pL^ where S' is a deterministic function (for instance the output of 
a computer code) and U a random finite- or infinite-dimensional vector with known distribution pF. In this section 
we assume that the cdf Fx of X is continuous. 

Definition 1.1 (Increasing random walk). Let AT be a real-valued random variable with continuous cdf Fx, 
Xq = —oo and define the Markov sequence (V„)„ such that: 


VneN, P[V„+iGA| Vo,-- - ,V„] 


p [V e A n (v„, +oo)] 
P [V G (V„, +C30)] 


(3) 


In other words (V„)„ is an increasing sequence where each element is randomly generated conditionally greater 
than the previous one: Xn+i p^(- \ X > V„). Assuming that Fx is continuous, the associated sequence (T„)„ 
such that Tn = — logP [X > V„] is distributed as the arrival times of a Poisson Process with parameter 1. 

Remark 1.2. Since X is continuous, the random walk can alternatively be defined with non-strict inequality: 
V„_|_i ~ {■ I X > Xn) without changing this result. 



In the sequel, we then consider for all a; G M the associated numbers p^ = V \X > x] and tx = — logp^,. Hence, 
for all a: G K, the counting random variable of the number of events before x: Mx = card {n G N | V„ < a:} follows a 
Poisson law with parameter tx = — \ogPx- From this result one can build an estimator for px- Indeed, simulating N 
iid. Poisson random variables with the same parameter tx' (MJ)flj^, one seeks for estimating In this context 

the Lehmann-Scheffe theorem insures that the Minimal-Variance Unbiased Estimator (MVUE) is: 


Px - ]V 


E m : 


Proposition 1.3 (Statistical properties of the probability estimator). 


.-V a.s 

Px —- >Px 

N—>-oo 


var [%] = pI {p^ - l) 


(4) 

(5) 

( 6 ) 


This estimator exhibits a logarithmic efficiency and asymptotically achieves the Cramer-Rao bound — \ogpx/N. 

Remark 1.4. The Last Particle Algorithm [17, 28] is only one possible implementation of this estimator; especially 
[31] studied its parallelisation and showed that it generates a marked Poisson process with parameter TV. On the 
other hand, [19] presented the increasing random walk linked with a Poisson process with parameter 1 but focused 
on logpx instead of px- 
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Remark 1.5. When generating a counting random variable at given state a: € M, one indeed generates 
for all xq < X. This gives a Glivenko-Cantelli like results: 


Va;o < X, Fn{xo) = 1 - 



a.s. 

N—¥oo 


> Fx{xo) 


( 7 ) 


with Mxg the sum of N iid. counting random variables at state xq. 

With several realisations of the increasing random walk it is also possible to define parallel quantile and mean 
estimators for continuous random variables. The interested reader is referred to [17] and [31] for quantile estimation 
and [30] for mean estimation. 


1.2. Extension of the random walk for discontinuous random variables 


We now intend to extend the definition of the random walk of Section 1.1 to the case where X is not necessarily 
continuous - or potentially discrete. Especially, while for all a; € ffi, P [X > x] = P [X > x] when Fx is continuous, 
there are now two alternative definitions for the increasing random walk (see Definition 1.1), precisely the one with 
strict inequality and the one with non-strict inequality (see Remark 1.2). In this latter case Simonnet [28] showed 
that the counting random variable at a given state x follows indeed a mixture of a Poisson law and independent 
Geometric laws with parameters depending on the atoms of X. This result prevented him from getting an unbiased 
estimator for p^. Instead he focused on the special case with only one jump point xq reached at the first iteration 
of the random walk. In this context he proposed to estimate the atom with a crude Monte Garlo on the first iid. 
sampling and then to consider the continuous random variable X with the conditional distribution given X > xq, 
that is the distribution with edf lx>xo {Fx{x) — Fx{xo)) / (1 — Fx{xo)) [28, Theorem 5[. 

In this paper we go one step beyond his work by providing a general MVUE for the non-strict inequality random 
walk as well as for the strict inequality random walk. We also introduce a method for getting an estimator with the 
same properties as above (Proposition 1.3) even if the random variable is discontinuous, but without any intrusive 
nor case specific trick as in [11] or [19[. 

From now on, we assume that X is a real-valued random variable, continuous or not, and we denote by D the 
set of its atoms. According to Froda’s theorem [14] it is countable. As in Section 1.1 we consider for all a; € K 
the counting random variable Mx = cardjn € N j A„ < a;}. The next two propositions aim at giving the law of 
Mx in the strict and non-strict cases. In both Propositions 1.6 and 1.7, we consider a; € M | = P [AT > x] >0, 

Dx = D n (—oo, x\ and define: Vc? € Dx'. 


P[X >d] 

~ p [x>dy 


( 8 ) 


Proposition 1.6 (Law of the counting random variable for the non-strict random walk). 


Mx 



+ X/ ^ (Ad) 

deD^ 


(9) 


with Q a Geometric law counting the number of failures before success. 

The distribution of Mx is that of the sum of a Poisson random variable with parameter —\ogpx/Y\Xd and of 
Geometric random variables with parameters {Ad)deD^', these random variables being independent. It is part of the 
proof given in Appendix that the distribution is well defined with finite mean and variance even when card)!!^;) = oo. 
Also this result was already stated in [28] with a combinatorial analysis assuming that card(D) < oo. Indeed it can be 
understood using the renewal property of a Poisson Process: the number of events corresponding to the continuous 
part, i. e. events A„ ^ D, follows a Poisson law with parameter — logp^, — log Ad) = — \og{px/ Jl Ad)- On the 
other hand each jump point leads to a random number of iterations following a Geometric law with probability of 
success P [A > d] /P[X > d] = Ad. 

Proposition 1.7 (Law of the counting random variable for the strict random walk). 


Mx 


-log 


Px 


n Ad 

Dx 


+ Y. B{l-Ad) 

dGDx 


( 10 ) 


with B a Bernoulli distribution. 
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Hence the strict inequality random walk replaces the Geometric distribution by a Bernoulli one. Since the 
Geometric law counts the number of failures while the Bernoulli one gives 1 in case of success, the parameter is the 
opposite. Furthermore, both Eq. (9) and Eq. (10) are equal when D = 0, i.e. when X is continuous. In this latter 
case, one finds back the pure Poisson distribution ~ P)— ^ogpx)- 


1.3. Probability estimators 

As noticed by [28] the formulas (9)-(10) are not very useful to derive an unbiased probability estimator. However 
we do not generate only iid. copies of the counting random variables but the random walks themselves. Hence if 
one can afford storing all the states of each random walk, then it is possible to build a MVUE in both cases. 


Lemma 1.8 (MVUE for a Geometric distribution). Let G ^ Q{p) he a Geometric random variable counting the 
number of failures before success with probability of success p and {Gi)fLi N iid. copies of G, then the minimal 
variance unbiased estimator for p is: 


P = 


N - 1 

N 

iV - 1 + E G, 

i=l 


( 11 ) 


Lemma 1.9 (MVUE for a Bernoulli distribution). Let B ~ B{l — p) be a Bernoulli random variable with probability 
of failure p and (Bi)^.^ N iid. copies of B, then the minimal variance unbiased estimator for p is: 


p = l- 


N 

i=l 

N 


( 12 ) 


Definition 1.10 (Run-length encoding). Let v = (ui,--- ,Vm) € R™, m > 1 be a vector such that Vi G |l,m — 
1]], Vi < Vi+i. We call the run-length encoding of v the vector r of the lengths of runs of equal values in v. 


In other words, the run-length encoding counts for any non decreasing sequence the number of times each value 
is repeated: for example if v = (0.5, 2.1, 2.1, 2.1, tt) then r = (1, 3, 1). Especially, if X is continuous the RLE of the 
states of a realisation of the increasing random walk (Ai, • • • , Xm), is r = (1, • • • , 1) G with probability 1 while 
on the contrary discontinuities will produce repeated values with non-zero probability. More precisely, the number 
of times each value is repeated corresponds to the number of failures while sampling above a threshold. With this 
consideration we are now in position to define the probability estimators. 

In the sequel we assume that for a given x G M | P [A > x] > 0, {Xi)^.^ is the merged and sorted sequence of the 

N 

states of N (non-)strict inequality random walks generated until state x; = X) Mr i® the sum of the counting 

i—1 

random variables of each random walk, r is the RLE of (Ai, • • • , Xm^), and I is its length. 

Proposition 1.11. The MVUE for the non-strict inequality random walk is: 


Px = l[ 


A- 1 
N-i + n' 


(13) 


It verifies: 

pI {p~pl!s^ “ {p~poL^ ~ 

Px 

with ppois = TT . and = ca,T:d{Dx). 

11 

It is interesting to notice here that in a case of a discrete random variable, Ppois = 
variance become: 


var [px] 

pI 


< 


N -I 
N-2 




- 1 . 


(14) 


1 and the bounds on the 
(15) 


It means that the coefficient of variation is bounded by a quantity which does not depend on the size but only on 
the number of jumps. Since this quantity is likely to be known with good confidence and does not vary much with 
ffDx, this can provide a robust upper bound for the coefficient of variation. 
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Figure 2. cdf of a Bernoulli random variable B ~ ^{p) (plain dark line) 
continuous random variable Y = B + U (dark dashed line) with U ~ ZY[0,1]. 


and its associated 


Proposition 1.12. The MVUE for the strict inequality random walk is: 


i 



i=l 


(16) 


It verifies: 


with g : (A, N) i—)■ 


A{N -1) + 1 

jVAi-i/JV 


var [%] = pI 



(17) 


On the one hand we have been able to define minimal variance unbiased estimators for both the strict and non- 
strict random walks. They become equal when the random variable is continuous and in this case one finds back 
the estimator defined in Section 1.1. Especially the variance increase due to discontinuities in the distribution of X 
is clearly visible in Eq. (17) as g > lifA< 1. On the other hand their distributions are not easy to characterise; in 
particular we could not derive any practical literal expression of the variance in the non-strict case. In this context 
we suggest to consider an auxiliary continuous random variable which involves an independent uniform random 
variable. This transformation is general and does not require any other knowledge on the problem. 

Indeed when generating a Geometric random variable with iid. Bernoulli trials, one can also consider the random 
variable Y = B + U with B ^ B{p) the Bernoulli random variable with probability of success p and U ~ 7/[0,1] 
an independent Uniform random variable on the interval [0,1]. Figure 2 plots the cdf of B and Y. Furthermore, 
Vy G [0, 2], {y > y} = {{B > [y\} r\ {U > Y — [y\ }}. Practically speaking, this means that the generation of the 
geometric random variable can be seen as a basic Acceptance-Rejection scheme used to generate the increasing 
random walk on Y until state 1: for each generated B, sample also U ~ 7/[0,1] and accept the transition for 
Y a U > y — ly\- Eventually, considering the fact that p = V[B = 1] = P [F > 1] = P [F > 1], p can also be 
estimated using the increasing random walk on the continuous random variable F. To conclude, in addition to the 
MVUE of p defined in Lemma 1.8 and at the cost of the generation of an independent uniform random variable, 
one also produces an estimator of the form of Eq. (4) with the same statistical properties. Embedding this in the 
generation of the non-strict inequality random walk gives then an estimator with the same properties as the one in 
the continuous case. Algorithm 1, Theorem 1.13 and Corollary 1.14 precise this point. In the sequel, we will refer 
to this estimator as the pure Poisson estimator. 


Theorem 1.13. In Algorithm 1, the random variable M follows a Poisson law with parameter — logP [X > x\. 
Corollary 1.14. Let N >2 and M = of N iid. realisations of Algorithm 1, the estimator 


Px = 



(18) 


has the same properties as in Proposition 1.3. 
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Algorithm 1 Pseudo-code for the non-strict inequality random walk and the pure Poisson estimator 

Require: x 

M = 0 

Draw Xi ^ and Ui ~ [/[O; 1]; n = 1 
3; while Xn < X do 
M = M + 1 

Draw A„+i ~ | A > A„) and C/„+i ~ 17[0; 1] 

6: while Xn+i = Xn do 

if Un +1 > Un then 
M = M + l 
9: end if 

n = n -I- 1 

Draw Xn+i ~ /i^(- | A > A„) and 17„+i ~ C/[0; 1] 

12: end while 

n = n -I- 1 
end while 
15: return M, (X„)„ 


Finally the distinction between the strict and non-strict random walks lets define two different estimators for 
the probability of exceeding a threshold. Both are unbiased and become the same when X is indeed continuous. 
However their distributions are not well-characterised. In this scope we have introduced a third estimator based on 
the non-strict random walk. With the addition of a while loop and a Uniform random variable, we have been able to 
produce an estimator which has always the same statistical properties, X being continuous or not. This estimator 
is not optimal in terms of variance when X is actually discontinuous but remains close to it when the jumps 
remain close to 1: the MVUE of Lemma 1.8 has a squared coefficient of variation approximately equal to (1 — A )/N 
while it is — log(A)/A for the pure Poisson estimator. These results are illustrated in the next Section. 

2. Examples 


2.1. Discontinuous random variable 

Problem setting 

We consider here the example used by Simonnet [28] to illustrate his results. It is a numerical study with a Euler 
scheme of a diffusive process satisfying: 


dUt = -XVdt + 



Uo = uo 


where Wj is a Wiener process, /3 ^ is the temperature, and U is a potential defined by: 


V{U1,U2) 



-u,u 


2 

2 ^ 


(19) 


for some a and b. The goal is to estimate the probability that the process enters a given set B before another set 
A from an initial state Ug: if tc is the stopping time defined by tc ■= inf {t > 0 | Uj G C}, then one seeks for 
estimating: 

P = PuoItb < ta], 

where Puq is the distribution of (Ui)t starting from Ug = ug. As a function of Ug this quantity is known as 
the Committor. From a practical point of view, it is often intractable and a reaction coordinate is introduced to 
measure how far a trajectory is escaping from A before returning to it: $ : —>■ M such that A = ((~cx): 0]) 

and B = ([1, -|-oo)). With these notations, a trajectory enters B before returning to A if and only if: 

X := sup $( 114 ) > 1. 

te[o,TA) 

X is then the real-valued random variable of interest and the problem is indeed to estimate P [A > 1]. With this 
notation, the theoretical results of Section 1 can be used directly. 




Conditional sampling 

This is the only requirement of the estimators defined in Section 1.3 but also the crucial point. To distinguish 
between theoretical results on the estimator and possible issues due to imperfect conditional sampling, Simonnet 
[28] presents two ways of generating conditional random variables, referred to as the Perfect and the Effective 
implementations. In the Perfect case, Acceptance-Rejection sampling is used to generate samples above a given 
threshold while in the Effective one, trajectories already following the targeted distributions are used to find an 
initial state u such that $(u) is greater or equal than the current threshold. In this example, we want to focus 
on the theoretical results on the number of iterations and on the estimators. Hence we make use of the Perfect 
sampling described in Algorithms 2. 


Algorithm 2 Perfect sampling of X ~ \ X > x) for the diffusive process 

Require: x G R > the current threshold one seeks to sample above 

A* = -oo 

while X* < X do 

Generate a new trajectory Ut starting from Uq 
A* = sup $(t7t) 

tG[0.TAus] 

end while 


Numerical results 

Here we set $(u) = 0.5(1 -I- ui), a = 0.6, b = 0.3, Uq = (—0.9,0), /3 = 10 and dt = 1 as in [28]. $(u) = 0.5(1 -I- ui) 
and so $(uo) = 0.05: with a large timestep some trajectories will go directly into A, producing a discontinuity 
in the cdf of A. We computed reference values using a crude Monte Carlo estimator with N = 10® and found 
p = 0.068 and A = 0.396. We then set N = 300 to get a coefficient of variation below 10% because: CV [pf « 
- logp/A ^ A « 268. 

We first focus on the distribution of the number of iterations described in Propositions 1.6 and 1.7 and on the 
corrected number of iterations to get a pure Poisson distribution (see Theorem 1.13). 



Number of iterations 



Number of iterations 



Number of iterations 


(a) Strict random walk 


(b) Non-strict random walk 


(c) Pure Poisson correction 


Figure 3. Histogram over 10“^ realisations of the number of iterations for the strict random walk 
(Eq. (10)), the non-strict one (Eq. (9)) and the pure Poisson correction (Theorem 1.13). A has 
one discontinuity at x = 0.05 and P [A > x] / P [A > x] « 0.396. The curves show the theoretical 
distributions. 


Figure 3 shows the histograms of the number of iterations for the strict random walk, the non-strict one and 
the pure Poisson correction. Their are in good agreement with the theoretical distributions presented in Eq. (10), 
Eq. (9) and Theorem 1.13 respectively. Especially we can see that for a given A, the cost of the estimators are 
different. Indeed, if one considers that the cost is the number of calls to a conditional simulator, then it is equal to 
the final number of iterations and Figure 3a and 3b present a clear shift: on average the discontinuity will produces 
(1/A— 1)A iterations for the non-strict random walk and only (1 —A)A for the strict random walk; with A = 0.396, 
this gives approximately 276 more iterations. On the histograms a shift of 250 to 300 is clearly visible in the x axis. 
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We now check the accuracy of the probability estimators. Firstly, they should be all unbiased. Secondly, for a 
given N one should see a variance increase from the MVUE of the non-strict random walk of Eq. (13) to the pure 
Poisson estimator (Eq. (18)) and to the MVUE of the strict random walk (Eq. (16)). 



Figure 4. Boxplots of the estimation of p « 0.068 over 10"^ simulations with N = 300, whiskers 
extending to the extreme values. (S): Strict random walk estimator (Eq. (16)), (NS): Non-strict 
random walk estimator (Eq. (13)), (PP): Pure Poisson estimator (Eq. (18)); with *: estimators if 
the discontinuity is not taken into account, i.e. (1 — i/ 7 V)Number of iterations^ 

Figure 4 shows a boxplot of the three estimators over 10^ simulations. As an illustration, the estimators computed 
directly as if X were continuous are also added to the plot. The horizontal line stands for the reference value 
calculated with a crude Monte Carlo. The estimated means are 0.068 for the strict random walk estimator, 0.068 
for the non-strict one and 0.068 for the pure Poisson one. This is in good agreement with the estimated reference 
value p = 0.068. Furthermore, the empirical variances are 5.18 x 10“® for the strict inequality random walk and 
4.21 X 10 ® for the pure Poisson estimator while the theoretical values given by Eq. (17) and Eq. (6) are 5.09 x 10 ® 
and 4.16 x 10“®. On the other hand, the probability estimators are clearly not consistent when the discontinuity 
is not handled properly, i.e. when the estimator is computed with the formula valid only in the continuous case 
(Eq. (4)). 

All together, these numerical results are in good agreement with the the theoretical ones. 

2.2. Discrete random variable 

Counting problems are typical cases where the random variable of interest is known to be integer-valued. Indeed, 
it is shown that many of these problems can be put into the setting of estimating extreme probability [4, 5, 21, 22]. 
Among other, we focus here on the Boolean SATisfiability Problem {SAT problem). We do not pretend being 
competitive against specific SAT solvers. Instead, we use this test case because the random variable will have 
several discontinuity points not only at the origin. 

The SAT problem 

A SAT problem comprises 1) a binary vector of n literals which can be either TRUE (=1) of FALSE (=0): 
u = (ui, • • • , Un) is called a truth assignment, e.g. u = (TRUE, TRUE, • • • , FALSE) = (1, 1, • • • ,0) and 2) a set of m 
clauses {51, •• • , 5m} expressed as OR logical operators (also denoted by V) on the ZzteraZs, e.g. Si = Ui^Vui^y ■ ■ -Vui^. 

The SAT problem in itself is then defined as follows: find an assignment u such that all clauses are true {SAT 
assignment problem) or count the number of different assignments which satisfy all the clauses {Sharp-SAT). The 
conjunctive normal form (CNF) of a SAT problem is then the product (logical operator AND or A) of all clauses 
E = 5i A • • • A Sm and both problems can be rewritten: 

SAT assignment problem: is there at least one u G {0,1}" such that E(u) = TRUE ? 
sharp-SAT: find card{S) = |5| with 5 = (u G (0,1}" | E(u) = TRUE} 

If one considers U a Uniform random vector on (0,1}”, then it is known [27] that: 

Pm=P[UG5] = M. 
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Hence one can build an estimator of |5| by estimating the (extreme) probability pm- In order to make use of the 
results of Section 1.3, one can consider the discrete random variable X = S'(U) of the number of clauses satisfied 
by the assignment U: 

m 

X = SiV) = J2s^{V) ( 20 ) 

Therefore X G |0, m] and: 


P [U e 5] = P [F(U) = TRUE] =P[X = m]=P[X>m]. 


Conditional simulations 

In order to perform the conditional simulations {■ | X > t) to simulate the random walks, we propose to 
use the Gibbs sampler as described in [24] or [11]. Indeed, /i^(- \ X > i) = ijF{■ \ S'(U) > i). Starting from a 
sample U* such that S'(U*) > i, it resamples each coordinate in a deterministic {systematic Gibbs sampler) or 
randomized order conditionally to the other ones to stay in the right domain. A sketch of a Gibbs sampler is given 
in Algorithm 3. Practically speaking, to avoid local maxima and improve the convergence of the Markov chain. 


Algorithm 3 Systematic Gibbs sampler 

Require: U = 




Draw Ui ~ 

(•1 

C/2,--- 

,Un) 

for i ^ 2,n — 

1 do 


Draw Ui ~ 


(• 1 Ui, 

‘ ‘ ’ 5 Ui — 1 , , * * * , Un'} 

end for 




Draw Un ~ pP 

■(•1 

c/i,--- 



we do not start from the current U such that Xi = S{\J). Instead, we pick at random a starting point U* in a 
population already following the target distribution, i.e. such that S')!!*) > Xi. This population is built on-the-fly 
with all the generated samples U such that 5'(U) > Xi. Especially each fail of a Geometric law will increase its size 
instead of replacing the previous vector as it is usual in a Multilevel Splitting method. We point out the fact that 
we focus on the real-valued random variable on the number of satisfied clauses and that multidimensional vectors 
are only seen as a way to generate it and to make conditional sampling on it. This makes us simulating the random 
walks by batches of size k on the model of [31]: at a given iteration only the smallest states are moved on. Reader 
is referred [31] for further details on parallel implementation. 

Finally, we also generate both the strict and the non-strict random walks in the same run. This is to focus on 
the statistical properties of the number of iterations and of the estimators. Here some Ad = P [A > d] /P\X > d] 
are very small, so that the size of the population for the conditional sampling may become very small if one only 
keeps those starting points strictly above the current threshold. 

Numerical results 

We consider here the SAT problem referred to as uf75-01 on satlib.org, also used by Botev and Kroese 
[6], who provide a reference value p = 5.977 x 10“^° with a relative error of 0.03%. It has m = 325 clauses in 
dimension n = 75. Hence A is a discrete random variable with up to 325 jump points. We first focus on the 
number of iterations of the random walks. To do so, we simulate N = 10'^ random walks as well as the pure Poisson 
correction. Figure 5 plots the histograms of the random number of iterations for each case and theoretical curves 
with the Ad estimated using an other simulation with N = 50000. 

These plots show a good consistency between theoretical formulae (Eq. (9) and Eq. (10) and Theorem 1.13) and 
numerical results. Especially with a lot of jump points, the number of iterations are very different from Figure 5a 
to Figure 5b (almost hundred times bigger). 

We now focus on the probability estimators. In this scope we also consider the Smoothed Splitting Method (SSM) 
[11], which uses a case-specific continuous auxiliary random variable. The aim of this benchmark is to assess the 
relevance of using such transformations instead of considering the original random variable with the MVUE we 
have proposed in Section 1.3. We refer the reader to [11] for further details on this transformation. The algorithm 
is then a usual Multilevel Splitting method with po = 0-2. We set Assm such that the total numbers of simulated 
samples for the non-strict random walk and for the SSM are of the same order of magnitude. The non-strict random 
walk generates on average 169.683 samples while an AMS with pq = 0.2 typically generates (1 — po)-'^ logp/logpo 
samples. We set N = 1000 for the random walks, which gives Assm ~ 7712. Here we set Assm = 8000. 
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(a) Strict random walk 
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(b) Non-strict random walk 
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Number of iterations 

(c) Pure Poisson correction 


Figure 5. Histogram over 10"^ realisations of the number of iterations for the strict random walk 
(Eq. (10)), the non-strict one (Eq. (9)) and the pure Poisson correction (Theorem 1.13). X is 
an integer-valued random variable with up to 325 jump points. The curves show the theoretical 
distributions with estimated Geometric parameters with N = 50000. 



Figure 6. Boxplots of the estimation of p « 5.977 x 10“^° over 100 simulations with N = 1000, 
whiskers extending to the extreme values. (S): Strict random walk estimator (Eq. (16)), (NS): 
Non-strict random walk estimator (Eq. (13)), (PP): Pure Poisson estimator (Eq. (18)), (SSM): 
Smoothed Splitting Method [11] with Assm = 8000. 


With a lot of discontinuity points, the variance differences between the three estimators is clearer, especially 
between the non-strict random walk estimator and the pure Poisson one. Also the strict inequality estimator 
has a much bigger variance. Indeed, for some discontinuity points d G D^, « 10“^ while N = 1000, which 

gives coefficients of variations around 1/y/Np « 32%. This is not an issue in the non-strict case as the coefficient of 
variation of the MVUE of Lemma 1.8 typically scales like a/(1 — p)/N. On the other hand, over the 100 simulations, 
we have an estimation of card(I?a;) ~ 66, which gives an upper bound for the squared coefficient of variation in 
the non-strict case (see Eq. (15)): 0.068, while the estimated squared coefficient of variation is 0.032. Concerning 
the SSM estimator, we have found a coefficient of variation of 0.33 while the theoretical value should be 0.117. As 
already noticed by [11] this is due to a non-perfect implementation of the Multilevel Splitting. This limitation is 
less visible while keeping the discrete random variable because we save all generated samples and so improve the 
approximation of the target distribution at each iteration. 

Finally, these numerical results with a discrete random variable show a good consistency with the theoretical 
ones. Also it appears that it may not be relevant to transform the problem to consider a Multilevel Splitting method 
on a continuous random variable because this can make the conditional simulations harder to approximate. The 
pure Poisson correction is in this context a good trade-off between accuracy and knowledge of the distribution. 
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Furthermore, it is just a by-product of the non-strict random walk and so the MVUE can also be computed in the 
same run. Concerning the strict inequality random walk, it may suffer from two limitations if the some are very 
small (typically if l/A^ becomes of the order of the total population size N) because 1) the coefficient of variation 
of the MVUE of Lemma 1.9 is « ^yi/{NAd), and 2) Markov chain drawing may be poor because only few samples 
will be available in the right domain: at each iteration, only the samples strictly above the current threshold are 
kept, so on average only NAd samples will be in the right domain. If Markov chain drawing is used to approximate 
conditional sampling, then the diversity of the population may decay strongly iterations after iterations. 

3. Conclusion 

The goal of this paper was to study the impact of using Multilevel Splitting methods on potentially discontinuous 
random variables, especially the parallel optimal (minimal variance) adaptive Multilevel Splitting method, also called 
the Last Particle Algorithm which only replaces the smallest particle at a given iteration. This implementation 
had been shown to be a particular case of generating several point processes associated with a real-valued random 
variable, those point processes, also called increasing random walk here (see Section 1.1) being related to a Poisson 
process with parameter 1. To handle potential discontinuities, we had to distinguish between strict and non-strict 
inequality for conditional sampling. Following Simonnet [28] who gave the law of the number of iterations of such 
algorithms with non-strict inequality, we have been able to extend this result to the strict inequality random walk. 
Furthermore, we have given the Minimal Variance Unbiased Estimator of the probability of exceeding a threshold 
in both cases. These estimators eventually become the same when the random variable of interest is actually 
continuous. On the other hand their distributions are not easy to characterise. In this scope, we have proposed a 
third estimator based on the non-strict inequality implementation. It has always the same statistical properties, 
X being continuous or not, precisely the properties for continuous random variables recalled in Section 1.1. These 
properties are well known and allow for building confidence intervals on the estimator. All these theoretical results 
have been challenged on two examples and proved good agreement with the results. 

Practically speaking, the strict inequality implementation may be less efficient if conditional sampling has to 
be done with Markov chain drawing and can even return 0 if some jump points d € are such that A))"^ = 
P [A > d] / P [A > d] is of the order of the population size: This is due to the estimation of the A^ with crude 
Monte Carlo. The non-strict implementation prevents from such issues because it generates Geometric random 
variables. On the other hand there is no literal expression of the variance of its estimator. At the cost of generating 
independent Uniform random variables, the non-strict implementation can also output the pure Poisson estimator, 
it is the estimator with known distribution which does not depend on the discontinuities of A. If one only want to 
focus on the output without worrying about discontinuities, this latter appears as good trade-off. 

Finally, considering the global cost of an estimator against its variance, some optimisation may be done to start 
some random walks only from a given point and/or stop them before the targeted value because all the jumps are 
not of equal size. This has not been studied here and is let for further research. 


The author would like to thank his advisors Josselin Gamier (University Paris Diderot) and Gilles Defaux (Commissariat a 
I’Energie Atomique et aux Energies Alternatives) for their advices and suggestions. 


Appendix 

Proof of Proposition 1.6 The distribution of has already been proved by [28] assuming that card Ha, < oo. 
We extend this result to the possible infinite countable number of discontinuity. 

Let S'n = {x € M I 0 < P [A = x] < 1/n} be the set of the jump points of Fx with jump amplitudes smaller 
than 1/n and A*^"^ = Alx^s„- A^"^ has a finite number of jump points and accumulates the (possibly infinite) 
number of jump points d € U | P [A = d] < 1/n at 0. Since it has a finite number of discontinuities, the law of its 
associated counting random variable is then known. 

Furthermore it verifies: 


Vx € 


a(") < 


P[X<x] + l,>o ^ P [A = d] - l,<o ^ P [A = d]. 

dGS-n 

d>x d<x 


Hence A*^") —-—> A, which implies —-—> with the counting random variable at state x associated 

n—¥oo n—¥oo 

with an increasing random walk on X^'^\ 
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Moreover, one has: Va; € M, Vn > 1, 


V 



P > a:; 

n 


deD^\s„ 


+ ^ 5 (Ad). Finally, this gives: 

dGD^\S„ 


M, 


V 



P [X > a:] 

n ^d 

dGD^ 


+ X/ ^ (^d) • 

Dx 


We also show that the first and second order moments of remain finite even when card(-D 2 ;) = oo. One has: 


Dx 




d^ Dx 


P[X = d\ 
P[X >d] 


< 


1 

P[X >x] 


Y,p[x = d]< 

d^Dx 


^-Px 

Px 


and: 

1 > n Ad > n > e-ifT > q. 

d^Dx d^Dx 

All together, theses inequalities give the result: 


E[M,] 
var [M^] 


log 


log 


Px 


n A 

d^Dx 


N 

E 


1 


d 


-1 <-logp,r + 


i-Px 


Px 


n A 

d^Dx 


dGD: 


Px 


dGD^ 


id 


i-Px 


X 


Proof of Proposition 1.7 Using the renewal property of the Poisson Process the number of events in the con¬ 
tinuous part will be the same as the one in the non-strict case. Indeed the only difference with the non-strict 
random walk comes from the behaviour of the random walk when X„ € D^- In this latter case, while the non-strict 
inequality repeats the trial until success, the strict inequality do it only once. Hence the Geometric law is replaced 
by a Bernoulli one. 

N 

Proof of Lemma 1.8 One is going to use Lehmann-Scheffe theorem with the statistic T = ^ Gi. As the sum of N 

i—1 

independent Geometric random variables with parameter p, T follows a Negative Binomial law: Vf € N, P [T = t] = 
(Af-Ht-i)pAr(l _ pY^ j' jg sufficient: 


- •• ,9N,p) = = (1 P^ = (1 

i^l 

T is also complete: let 0 be a measurable function, one has: 

Vp e (0,1), E [(/)(r)] = 0 ^ Vp e (0, i), b - p)‘ = o 

oo 

€ ( 0 , 1 ), 0 ( 0 * 
t=o 

with at = and 0 = 1 — p. Furthermore p = 1 i.e. 9 = 0 gives (/)(0) = 0 and 6* = 1, i.e. p = 0 gives 

P [T < oo] = 0. Hence the power series 6 i® identically null on its radius of convergence [0,1) and so 

yt G N, at = 0, which means Vt G N, = 0 and T is complete. 
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N 


We now consider the estimator R = — ^ R is unbiased because E [l^r.^o] = P [G'l =0] = p. Then 

Lehmann-Scheffe theorem states that E [i? | T] is the MVUE of p. This gives: 


N 


E[i?|r = t] = -^E 


2=1 


N 


2=1 


= P 


N 


lGi=0 I E ~ ^ 

= P [Gi = 0 ] 


N 


Gi = 0, E G* = t 

i=2 


N 

^ Gi = t 
i=l 


Gi = 0|EG* = t 

2^1 

(t+N-i)pN^l_py 


E[R\T = t] = 


N -1 
N-l+t 


Hence, p = E [i? | T] = (TV - l)/(iV - 1 + T) is the MVUE of p. 

Proof of Proposition 1.11 On the one hand, for all a < 6 such that X is continuous on (a, 6), P [V > & | V > a] 

can be estimated by (1 — with #{V„ € (a, 6)} the number of events of the sum process in {a,b). 

Moreover, since 5 i—>■ P [V > 5] is left-continuous, it is also an MVUE of P [V > 6 | V > a] and this relation remains 
true if 6 —)■ a since card(0) = 0. Using the fact that the RLE of {Xi, • • • , Xm^), r = (1, •••, 1) with probability 1 
when X is continuous, (1 — = Jl(-^ ~ — 1 + ?'i) with probability 1. 

On the other hand Vd G D^, Ad = P [V > d] / P [V > d] can be estimated with the MVUE defined in Lemma 

1.8. The first state of each chain non smaller than d can be considered as an iid. sample of V | V > d. The 
number of times d is found in {Xi, ■ ■ ■ ,Xm^) is then the sum of N realisations of a Geometric random variable 
with parameter A^. Hence, is estimated with [N — l)/(iV — 1 -|- r^) with Vd the number of times d is found in 

(Ai,--- ,AmJ. 

Since D is countable, we consider M \ I? = IJ^ R with {Ii)i a sequence of disjoint open intervals. Note that some 
subsequence of /„ may converge toward the empty set if D is infinite countable with some accumulation points. 
However X is continuous on M \ U. Using the renewal property of the Poisson process, one can consider that 
J([(TV — l)/(iV — 1 -I- ri) is a product of independent MVUE estimators. Especially, denoting by Mpois the number 
of 1 in r, (1 - is a MVUE of Ppois := Px /H ^d- 

We now explicit the calculation for the bounds on the variance. One has: 


E[fx] =ppoisPpoif n E 




N-1 
N-l + Td 


with Td ~ NegBin{N, Ad). For a given d G Dx, one has: 


E 


N-1 
N-l + Td 




t=0 




t=o 


N-l+t 

N-1 




t=o 


N-l + t 

N-2 N-lN-2 + t 
N-2 + tN-2N - 1 + t' 


Furthermore, Vt > 0, ———rTl ^ [1> (A — ^)/{N — 2)], which gives: 


N-2N-1+t 


Al<E 


N-1 
N-l + Td 


< Arf 


,N-1 

N-2' 


Eventually the variance writes: 


pi {ppoi^ “ ^ -pi<pi (n 


-l/N TT TV — 1 

pois j, j_ jy _ 2 

deD, 


- 1 
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Proof of Proposition 1.12 The same reasoning as for the proof of Proposition 1.11 applies where the MVUE of 
a Geometric law is replaced by the one of a Bernoulli distribution. For the variance one has: 


^pI] 


_ 2 -1/N 

^^pois Pv 


pois 


n 


Dx 




= pI Px 


n 

dGDx 


AdiN-l) + l 


NA 


l-l/N 


which gives the result. 

Proof of Theorem 1.13 The difference between the generation of the non-strict random walk and Algorithm 1 
stands in the addition of the while loop from line 6 to line 12. This loop is entered when a Geometric scheme is 
started: two consecutive events being equal is a non-zero probability event only when A„ € D. In this context, the 
condition {Un+i > Un} lets generate the counting random variable of the continuous random variable associated 
with the Geometric scheme as explained in Section 1.3. Therefore it follows a Poisson distribution with parameter 
— log Ax„ ■ The renewal properties of the Poisson Process lets conclude the proof. 
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